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Abstract — A stochastic flow network is a directed graph with 
incoming edges (inputs) and outgoing edges (outputs), tokens en- 
ter through the input edges, travel stochastically in the network, 
and can exit the network through the output edges. Each node 
in the network is a splitter, namely, a token can enter a node 
through an incoming edge and exit on one of the output edges 
according to a predefined probability distribution. Stochastic flow 
networks can be easily implemented by DNA-based chemical re- 
actions, with promising applications in molecular computing and 
stochastic computing. In this paper, we address a fundamental 
synthesis question: Given a finite set of possible splitters and 
an arbitrary rational probability distribution, design a stochastic 
flow network, such that every token that enters the input edge 
will exit the outputs with the prescribed probability distribution. 

The problem of probability transformation dates back to von 
Neumann's 1951 work and was followed, among others, by Knuth 
and Yao in 1976. Most existing works have been focusing on the 
"simulation" of target distributions. In this paper, we design 
optimal-sized stochastic flow networks for "synthesizing" target 
distributions. It shows that when each splitter has two outgoing 
edges and is unbiased, an arbitrary rational probability | with 
a < b < 2" can be realized by a stochastic flow network of 
size n that is optimal. Compared to the other stochastic systems, 
feedback (cycles in networks) strongly improves the expressibility 
of stochastic flow networks. 

Index Terms — Stochastic Flow Network, Random-walk Graph, 
Probability Synthesis. 



I. Introduction 

The problem of probability transformation dates back to 
von Neumann [10| in 1951, who first considered the problem 
of simulating an unbiased coin by using a biased coin with 
unknown probability. He observed that when one focuses on 
a pair of coin tosses, the events HT and TH have the same 
probability (H is for 'head' and T is for 'tail'); hence, HT 
produces the output symbol and TH produces the output 
symbol 1. The other two possible events, namely, HH and 
TT, are ignored, namely, they do not produce any output 
symbols. More efficient algorithms for simulating an unbiased 
coin from a biased coin were proposed by Hoeffding and 
Simons Q, Elias Q, Stout and Warren lTT6l and Peres ifTTl . 
In 1976, Knuth and Yao [8| presented a simple procedure for 
generating sequences with arbitrary probability distributions 
from an unbiased coin (the probability of H and T is |). 
They showed that the expected number of coin tosses is upper- 
bounded by the entropy of the target distribution plus two. Han 
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and Hoshi [6| and Abrahams JTJ generalized their approach 
and demonstrated how to generate an arbitrary probability dis- 
tribution using a general M -sided biased coin. All these works 
have been focusing on the "simulation" side of probability 
transformation, and their goal is to minimize the expected 
number of coin tosses for generating a certain number of target 
distributions. 

There are a few works that considered the problem of prob- 
ability transformation from a synthetic perspective, namely, 
designing a physical system for "synthesizing" target distri- 
butions, by connecting certain probabilistic elements. Such 
probabilistic elements can be electrical ones based on in- 
ternal thermal noise or molecular ones based on inherent 
randomness in chemical reactions. In this scenario, the size 
of the construction becomes a central issue. In 1962, Gill |4| 
[5 1 discussed the problem of generating rational probabilities 
using a sequential state machine. Later, Sheng lfl3l considered 
applying threshold logic elements as a discrete probability 
transformer. Recently, Wilhelm and Bruck IfTTl proposed a 
procedure for synthesizing stochastic switching circuits to 
realize desired discrete probabilities. More properties and 
constructions of stochastic switching circuits were studied 
by Zhou, Loh and Bruck 0, ED, Q9); Qmn et. al. HI 
studied combinational logic for transforming a set of given 
probabilities into target probabilities. Motivated by stochastic 
computing based on chemical reaction networks [14|, in this 
paper we study stochastic flow networks. A stochastic flow 
network is a directed graph with incoming edges (inputs) and 
outgoing edges (outputs), tokens enter through the input edges, 
travel stochastically in the network and can exit the network 
through the output edges. Each node in the network is a 
splitter, namely, a token can enter a node through an incoming 
edge and exit on one of the output edges according to a 
predefined probability distribution. We address a fundamental 
synthesis question: Given a finite set of possible splitters 
and an arbitrary rational probability distribution, design an 
optimal-sized stochastic flow network, such that every token 




Fig. 1 . An instance of stochastic flow network that consists of three p-splitters 
for any p and generates probability i. 
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that enters the input edge will exit the outputs with the 
prescribed probability distribution. 

Stochastic flow networks can be easily implemented by 
chemical reaction networks, where each splitter corresponds 
to two types of molecules, and incoming tokens (another type 
of molecules) can react with both, hence react with one of 
them with a certain probability. Compared to the synthetic 
stochastic systems described above, stochastic flow networks 
demonstrate strong powers in expressing an arbitrary rational 
target distribution. Fig. Q] depicts von Neumann's algorithm 
in the language a stochastic flow network that consists of 
three p-splitters for any p and generates probability |. Here, 
a p-splitter indicates a splitter with two outgoing edges with 
probabilities p and (1 —p). In this construction, we have two 
outputs fa} = {0, 1} (corresponding to the labels and 
1, respectively). For each incoming token, it has the same 
probability pq to reach either output or output 1 directly, 
and it has probability 1 — 2pq to come back to the starting 
point. Eventually, the probability for the token to reach each 
of the outputs is ~. In general, the outputs of a stochastic flow 
network have labels denoted by {fa, fa, ■ fan}- A token will 
reach an output fa (1 < k < m) with probability qk, and we 
call qu the probability of fa and call {qi,q2, ■ ■■,q m } the output 
probability distribution of the network, where J2T= 1 1 k ~ 1- 

In this paper we assume, without loss of generality, that 
the probability of each splitter is i (^ -splitters can be im- 
plemented using three p-splitters for any p). Our goal is to 
realize the target probabilities or distributions by constructing 
a network of minimum size. In addition, we study the expected 
latency, namely the expected number of splitters a token need 
to pass before reaching the output (or we call it the expected 
operating time). 

The main contributions of the paper are 

1) General optimal construction: For any desired rational 
probability, an optimal-sized construction of stochastic 
flow network is provided. 

2) The power of feedback: We show that with feedback 
(loops), stochastic flow networks can generate signifi- 
cantly more probabilities than those without feedback. 

3) Constructions with well-bounded expected latency: We 
give two constructions whose expected latencies are 
well-bounded by constants. As a price, they use a few 
more splitters than the optimal-sized one. 

4) Constructions for arbitrary rational distributions: We 
generalize our constructions so that they can re- 
alize an arbitrary rational probability distribution 
{qi,q2,-,q m }- 

The remainder of this paper is organized as follows. In 
Section [TT] we introduce some preliminaries including Knuth 
and Yao's scheme and a few mathematical tools for calculating 
the distribution of a given stochastic flow network. Section 
Hill introduces an optimal-sized construction of stochastic flow 
networks for synthesizing an arbitrary rational probability, 
and it demonstrates that feedback significantly enhances the 
expressibility of stochastic flow networks. Section HVl analyzes 
the expected latency of the optimal-sized construction. Section 
rvl gives two constructions whose expected latencies are upper 
bounded by constants. Section |VI] presents the generalizations 



of our results to arbitrary rational probability distributions. The 
concluding remarks and the comparison of different stochastic 
systems are given in Section [VTll 

II. Preliminaries 

In this section, we introduce some preliminaries, including 
Knuth and Yao's scheme for simulating an arbitrary distribu- 
tion from a biased coin, and how using absorbing Markov 
chains or Mason' Rule to calculate the output distribution of 
a given stochastic flow network. 

A. Knuth and Yao 's Scheme 

In 1976, Knuth and Yao proposed a simple procedure for 
simulating an arbitrary distribution from an unbiased coin (the 
probability of H and T is i) JS). They introduced a concept 
called generating tree for representing the algorithm [2|. The 
leaves of the tree are marked by the output symbols, and the 
path from the root node to the leaves indicates the sequences 
of bits generated by the unbiased coin. Starting from the root 
node, the scheme selects edges to follow based on the coin 
tosses until it reaches one of the leaves. Then it outputs the 
symbol marked on that leaf. 

In general, we assume that the target distribution is 
{pi,P2i - : Pm}' Since all the leaves of the tree have prob- 
abilities of the form 2~ fc (if the depth of the leaf is k), we 
split each probability pi into atoms of this form. Specifically, 
let the binary expansion of the probability pi be 

i>i 

(i) — ' 

where p\ = 2 3 or 0. Then for each probability pi, we get a 
group of atoms {p^ : j > 1}. For these atoms, we allot them 
to leaves with label fa on the tree. Hence, the probability of 
generating fa is pi. We can see that the depths of all the atoms 
satisfy the Kraft inequality 0, i.e., 

m 

i=i j>i 

So we can always construct such a tree with all the atoms 
allotted. Knuth and Yao showed that the expected number of 
fair bits required by the procedure (i.e. the expected depth of 
the tree) to generate a random variable X with distribution 




Fig. 2. The generating tree to generate a i) distribution. 
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{pi,P2, ...,p m } lies between H(X) and H(X) + 2 where 
H(X) is the entropy of the target distribution. 

Fig. [2] depicts a generating tree that generates a distribution 
{§,|}, where the atoms for | are {|, |, ...}, and the 
atoms for | are { ...}. We see that the construction of 
generating trees is, in some sense, a special case of stochastic 
flow networks that without cycles. If we consider each node in 
the generating tree as a splitter, then each token that enters the 
tree from the root node will reach the outputs with the target 
distribution. While Knuth and Yao's scheme aims to minimize 
the expected depth of the tree (or in our framework, we call it 
the expected latency of the network), our goal is to optimize 
the size of the construction, i.e., the number of nodes in the 
network. 

B. Absorbing Markov Chain 

Let's consider a stochastic flow network with n splitters and 
to outputs, in which each splitter is associated with a state 
number in {1,2, and each output is associated with a 

state number in {n+1, n+2, n+m}. When a token reaches 
splitter i with 1 < i < n, we say that the current state of this 
network is i. When it reaches output k with 1 < k < m, 
we say that the current state of this network is n + k. Note 
that the current state of the network only depends on the last 
state, and when the token reach one output it will stay there 
forever. So we can describe token flows in this network using 
an absorbing Markov chain. If the current state of the network 
is i, then the probability of reaching state j at the next instant 
of time is given by pij. Here, pij = pn {pij = Pt) if and 
only if state i and state j is connected by an edge H (T). 

Clearly, the network with n splitters and m outputs with 
different labels can be described by an absorbing Markov 
chain, where the first n states are transient states and the last 
m states are absorbing states. And we have 



En+m 1 

j=l Pij = 1 

Pij = o 

Pli = 1 



i = 1, 2, n + to, 
\/i > n and i ^ j, 
Vi > n. 



The transition matrix of this Markov chain is given by 



P = 



n 
m 



Q R 
/ 



where Q is an n x n matrix, R is an n x to matrix, is an 
m x n zeros matrix and I is an to x to identity matrix. 

Let Bij be the probability for an absorbing Markov chain 
reaching the state j + n if it starts in the transient state i. Then 
B is an n x m matrix, and 

B = (I- Q)~ 1 R. 

Assume this Markov chain starts from state 1 and let Sj be 
the probability for it reaching the absorbing state j + n. Then 
S is the distribution of the network 

S = [l,0,...,0]B = ei(/-Q)" 1 i?. 

Given a stochastic flow network, we can use the formula 
above to calculate its probability distribution. For example, 
the transition matrix of the network in Fig. [3] is 




Fig. 3. The stochastic flow network to generate a (i) i) distribution. 
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From which we can obtain the probability distribution 

S = e 1 (I-Q)- 1 R=( ||). 

C. Mason 's Rule 

Mason's gain rule is a method used in control theory to 
find the transfer function of a given control system. It can be 
applied to any signal flow graph. Generally, we describe it as 
follows (see more details about Mason's rule in lfT31l ): 

Let H (z) denote the transfer function of a signal flow graph. 
Define the following notations: 

1) A(z) = determinant of the graph. 

2) L = number of forward paths, with Pk(z), 1 < k < L 
denoting the forward path gains. 

3) A/c(z) = determinant of the graph that remains after 
deleting the fcth forward path Pk(z). 

To calculate the determinant of a graph A(z), we list all 
the loops in the graph and their gains denoted by L i7 all pairs 
of non-touching loops LiLj, all pairwise non-touching loops 
LiLjLk, and so forth. Then 

A(z) = l-^L. i+ J2 

t:loops (i,j):non-touching 

The transfer function is 



LiLj — 



H(z) 



A(z) 



called Mason's rule. 

Let's treat a stochastic flow network as a control system 
with input U(z) = 1. Applying Mason's rule to this system, 
we can get the probability that one token reaches output k with 
1 < k < m. Also having the network in Fig. [3] as an example: 
In this network, we want to calculate the probability for a 
token to reach output 1 (for short, we call it as the probability 
of 1). Since there is only one loop with gain = \ and only 
one forward path with forward gain i we can obtain that the 
probability of 1 is 



P 



1 

3' 
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Fig. 4. Tree structure used to realize probability for an integer x(0 < 

x < 2") . 

which accords with the result of absorbing Markov chains. In 
fact, it can be proved that the Mason's rule and the matrix 
form based on absorbing Markov chains are equivalent. 

III. Optimal-Sized Construction and Feedback 

In this section we present an optimal-sized construction of 
stochastic flow networks. It consists of splitters with proba- 
bility 1/2 and computes an arbitrary rational probability. We 
demonstrate that feedback (loops) in stochastic flow networks 
significantly enhance their expressibility. To see that, let's first 
study stochastic flow networks without loops, and then those 
with loops. 

A. Loop-free networks 

Here, we want to study the expressive power of loop-free 
networks. We say that there are no loops in a network if no 
tokens can pass any position in the network more than once. 
For loop-free networks, we have the following theorem: 

Theorem 1. For a loop-free network with n ^-splitters, any 
probability ^ with integer x(0 < x < 2") can be realized, 
and only probabilities ^ with integer x(0 < x < 2 n ) can be 
realized. 

Proof: a) In order to prove that all probability with 
integer x(0 < x < 2") can be realized, we only need to 
provide the constructions of the networks. 

1) Construct a tree, as shown in Fig. |4] In this tree structure, 
each token will reach Ai(l < i < n) with probability 
2~\ and reach A n+ i with probability 2~ n . 

2) Let = J27=i 7i2 _l , where 7, = or 1. For each j 
with 1 < j < n, 7j = 1, we connect Aj to output 0; 
otherwise, we connect Aj to output 1. Then we connect 
A n+ i to output 1. Eventually, the probability for a token 
to reach output is 

n n—1 
p= 7n-j = Jj_ = X_ 

/ j 2-7 ' 4 On—i 2 n * 

j=l i=0 

Using the procedure above, we can construct a network such 
that its probability is Actually, it is a special case of Knuth 
and Yao's construction (8). 



b) Now, we prove that only probability ^ with integer 
x(Q < x < 2 n ) can be realized. If this is true, then ^ with 
odd x cannot be realized with less than n splitters. It means 
that in the construction above, the network size n is optimal. 

According to Mason's rule, for a network without loops, the 
probability for a token reaching one output is 

k 

where P^ is the path gain of a forward path from the root to 
the output. Given n splitters, the length of each forward path 
should be at most n. Otherwise, there must be a loop along 
this forward path (have to pass the same splitter for at least 
two times). For each k, Pk can be written as f£ for some Xk- 
As a result, we can get that P can be written as for some 
x. M 

B. Networks with loops 

We showed that stochastic flow networks without loops 
can only realize binary probabilities. Here, we show that 
feedback (loops) plays an important rule in enhancing their 
expressibility. For example, with feedback, we can realize 
probability | with only two splitters, as shown in Fig. [5] 
But without loops, it is impossible (or requires an infinite 
number of splitters) to realize |. More generally, for any 
desired rational probability | with integers < a < b < 2", 
we have the following theorem: 

Theorem 2. For a network with n ^-splitters, any rational 
probability ^ with integers < a < b < 2" can be realized , 
and only rational probabilities # with integers < a < b < 2™ 
can be realized. 

Proof: a) We prove that all rational probability # with 
integers < a < b < 2™ can be realized. When 6 = 2", the 
problem becomes trivial due to the result of TheoremQ] In the 
following proof, without loss of generality (w.l.o.g), we only 
consider the case in which 2 n_1 < b < 2™ for some n. 

We first show that all probability distributions {^r, ^r, ^r} 
with integers x, y, z s.t. (x + y + z = 2") can be realized with 
n splitters. Now let's construct the network iteratively. 

When n = 1, by enumerating all the possible connections, 
we can verify that all the following probability distributions 
can be realized: 

{0,0,1}, {0,1,0}, {1,0,0}, 
f 1 1, f l 1, A 1 , 
{°'2'2 } ' { 2'°'2 } ' { 2'2'° } - 

So all the probability distributions {§,§,§} with integers 
x, y, z s.t. (x + y + z — 2) can be realized. 

Assume that all the probability distribution jr, ^} 
with integers x, y, z s.t. (x + y + z = 2 k ) can be realized by a 
network with k splitters, then we show that any desired prob- 
ability distribution {^iett, ^t, -^Sfr} s.t. x + y + z = 2 k+1 
can be realized with one more splitter. Since x + y + z = 2 k+1 , 
at least one of x, y, z is even. W.l.o.g, we let x be even. Then 
there are two cases to consider: either both y and z are even, 
or both y and z are odd. 
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(a) 



2 /> + l 




(b) 

Fig. 5. (a) The network to realize { * +1 , , 2k z +1 } iteratively. (b) The 
network to realize {§,! — §}■ 



When both y and z are even, the problem is trivial 
since the desired probability distribution can be written as 



( x/2 
12 1 "' "2 s 



}, which can be realized by a network with k 



y/2 z/2 

splitters. 

When both y and z are odd, w.l.o.g, we assume that z < 
y. In this case, we construct a network to realize probability 
distribution {^r, ^ v ~ 2 i^ 2 , ^} with fc splitters. By connecting 
the last output with probability ^ to an additional splitter, we 

can get a new distribution {^r, 2 k t 2 ppri 2 £fr}- If we 
consider the second and the third output as a single output, 
then we can get a new network in Fig. |5(a)| whose probability 
distribution is {^ft, ^tt}- 

Hence, for any probability distribution { , , } with 
x + y + z = 2", we can always construct a network with n 
splitters to realize it. 

Now, in order to realize probability | with 2 n_1 < b < 
2™ for some n, we can construct a network with probability 
distribution {^-, ^rr. 2 9 ~ b } with n splitters and connect the 
last output (output 2) to the starting point of the network, as 
shown in Fig. |5(b)| Using the method of absorbing Markov 
chains, we can obtain that the probability for a token to reach 
output is f ■ A simple understanding for this result is that: 

(1) the ratio of the probabilities for a token to reach the first 
output and the second output is ^- : ^rr that equals a : (b — a) 

(2) the sum of these two probabilities is 1, since the tokens 
will finally reach one of the two outputs. 

b) Now we prove that with n splitters, only rational prob- 
ability # with integers < a < b < 2™ can be realized. 
For any flow network with n splitters, it can be described 
as an absorbing Markov chain with n transient states and 2 



absorbing states, whose transition matrix P can be written as 

/ Pn ■■■ Pin ?>i(n+i) Pi(n+2) \ 



Pnl 



V o 



Pnn Pn(n+1) Pn(n+2) 

1 
1 



where each row consists of two \ entries and n zeros. 



Let 



Q = 



Pn 



Pnl 



Pi, 



Pn 



,R = 



Pl(n+1) Pl(n+2) 



Pn{n+1) Pn(n+2) 



then the probability distribution of the network can be written 

as 

e x {I-Q)- l R. 

In order to prove the result in the theorem, we only need 
to prove that (/ — Q)~ 1 R can be written as j^A with b < 2", 
where A is an integer matrix (all the entries in A are integers). 

Let K = I — Q, we know that K is invertible if and only 
det(K) 0. In this case, we have 



det(K) ' 



where Kji is defined as the determinant of the square matrix 
of order (n — 1) obtained from K by removing the i th row 
and the j th column multiplied by (— 

Since each entry of K is chosen from {0, |, 1}, Kji can 
be written as for some integer kji and det(K) can be 

written as for some integer b. According to Lemma Q] in 
the appendix, we have < det(K) < 1, which leads us to 
< b < 2™ (note that det(K) ^ 0). 

Then, we have that 



IC 



( K n 

K 12 



det(K) 



K 21 
K 22 



K n2 



\ Ki n K 2n . . . K nn 
( fcn fc 2i ... k nl ^ 

fci2 k 22 ... k n 2 



\ kin k 2n . . . k nn J 
Since each entry of R is also in {0, i, 1}, we know that 



2R = 



I rn r 12 \ 

r 2 i r 22 

V r m r n2 J 



is an integer matrix. 
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As a result 







1 fell 


k 2 i 


■ ■ k nl \ 








2R 




k 2 2 


■ ■ k n2 








b 




















\ k ln 


k2n 


h 


1 










( fcn 


k 2 i ■ ■ 


■ k nl ^ 




( 


r\\ 


r\2 \ 


1 




ki2 


k 2 2 ■ ■ 


• k n2 






T21 


T22 


b 






















k 2n ■ ■ 


knn ) 




\ 


r n i 


r n 2 J 



A 
P 

where each entry of A is an integer. So all the probabilities in 
the final distribution are of the form |. 

This completes the proof. ■ 
Based on the method in the theorem above, we can realize 
any arbitrary rational probability with an optimal-sized net- 
work. The construction has two steps: 

1) Construct a network with output distribution 
{^r, ^r, 2 2 ,7 b } iteratively using at most n splitters. 

2) Connect the last output to the starting point, such that 
the distribution of the resulting network is {| , ^p}- 

When b — 2™ for some n, the construction above is exactly 
the generating tree construction in the Knuth and Yao's scheme 
as described in Section HI] Now, assume we want to realize 
probability ||. We can first generate a probability distribution 
{||, || , which can be realized by adding one splitter to 
a network with probability distribution {X, -A, ^g}... Recur- 
sively, we can have the following probability distributions: 



14 15 3 7 
^32' 32' 3T ~^ ^16' 
,1 3, 
^ { 4'°'4> 



G 

16 



3 r 2 
16 } ^ { 8' 

^2'°'2 } - 



3 3 



} 



As a result, we 



get 

3 



network to generate probability 



distribution {||, || , ^}, as shown in Fig. |6(a)| where only 
5 splitters are used. Connecting the last output to the starting 
point results in the network in Fig. |6(b)| with probability ||. 
Comparing the results in Theorem [2] with those in Theorem 
Q] we see that introducing loops into networks can strongly 
enhance their expressibility. 

IV. Expected latency of Optimal Construction 

Besides of network size, anther important issue of a stochas- 
tic flow network is the expected operating time, or we call it 
expected latency, defined as the expected number of splitters 
a token need to pass before reaching one of the outputs. For 
the optimal-sized construction proposed in the above section, 
we have the following results about its expected latency. 

Theorem 3. Given a network with rational probability £ with 
b < 2™ constructed using the optimal-sized construction, its 
expected latency ET is upper bounded by 

,3n 1,2" 3n 1 

T + 2- 



ET < (- 



12" 



< 




32 



(b) 



Fig. 6. (a) The network to realize probability distribution { T 
(b)The network to realize probability ^j. 



By making the construction more sophisticated, we can reduce the upper 
bound to (f + |)^. 



Proof: For the optimal-sized construction, we first prove 
that the expected latency of the network with distribution 
{#r, *5?, ^} is bounded by 3p + \. 

Let's prove this by induction. When n = or n = 1, it is 
easy to see that this conclusion is true. Assume when n = k, 
this conclusion is true, we want to show that the conclusion 
still holds for n = k + 2. Note that in the optimal-sized 
construction, a network with size k + 2 can be constructed 
by adding two more splitters to a network with size k. Let 
denote the latency of the network with size k, then 

E[T k+2 ] =E[T k ]+ Pl + P2 , 

where pi is the probability for a token to reach the first 
additional splitter and p 2 is the probability for a token to reach 
the second additional splitter. Assume the distribution of the 



7 




Fig. 7. Illustration for the construction of a network with unbounded expected 
latency. Here, we have p x > p y > p z . 



network with size k is {91,52,93}, then 

Pi+p 2 < max(q l + + ty-)) < ^. 

So the conclusion is true for n = k + 2. By induction, we 
know that it holds for all n £ {0, 1, 2, ...}. 

Secondly, we prove that if the expected latency of the net- 
work with distribution {qi,q 2 , 93} is ET', then by connecting 
its last output to its starting point, we can get a network such 
that its expected latency is ET = q f+ q2 ■ This conclusion can 
be obtained immediately from 



ET = ET 1 + q 3 (ET). 

This completes the proof. ■ 

Theorem 4. There exists a network of size n constructed using 
the optimal-sized construction such that its expected latency 
ET is lower bounded by 

n 2 
ET> - + -. 

~ 3 3 

Proof: We only need to construct a network with dis- 
tribution {^-, ^} for some integers x,y,z such that its 
expected latency is lower bounded by § + §■ 

Let's construct such a network in the following way: Start- 
ing from a network with single splitter, and at each step 
adding one more splitter. Assume the current distribution is 
{Px,Py,Pz} with p x > p y > p z (if this is not true, we can 
change the order of the outputs), then we can add an additional 
splitter to p x as shown in Fig. [7] Iteratively, with n splitters, 
we can construct a network with distribution JL, ^-} for 
some integers x, y, z and its expected latency is more than 

n 1 2 
3 ' 3" 

By connecting one output with probability smaller than i 
to the starting point, we can get such a network. ■ 

The theorems above show that the upper bound of the 
expected latency of a stochastic flow network based on the 
optimal-sized construction is not well-bounded. However, this 
upper bound only reflects the worst case. That does not 
mean that the optimal-sized construction always has a bad 
performance in expected latency when the network size is 
large. Let's consider the case that the target probability is 
# with b = 2" for some n. In this case, the optimal-sized 
construction leads to a tree structure, whose expected latency 



can be written as 

n 



1 n 
2 l + 2" 



ra-1 



[£* i+1 r-£ 



i=l 



= 2- 



1 - X 
1 



l-x 



which is well-bounded by 2. 

V. Alternative Constructions 

In the last section, we show that the expected latency 
of a stochastic flow network based on the optimal-sized 
construction is not always well-bounded. In this section, we 
give two other constructions, called size-relaxed construction 
and latency-oriented construction. They take both the network 
size and the expected latency in consideration. Table U shows 
the summary of the results in this section, from which we can 
see that there is a tradeoff between the upper-bound on the 
network size and the upper-bound on the expected latency. 

A. Size-Relaxed Construction 

Assume that the desired probability is f with 2" _1 < b < 
2™ for some n. In this subsection, we give a construction, 
called size-relaxed construction for realizing with at most 
n + 3 splitters and its expected latency is well-bounded by a 
constant. 

Assume a and b are relatively prime, and let c = b — a. Then 
Tpr and t£t can be represented as binary expansions, namely 



— > ai 2~ 



i=l 



Let's start from the structure in Fig. [8] where the probability 
of Ai with 1 < i < n is 2 1 and the probability of A. n +\ is 
2~". We connect Ai with 1 < i < n+1 to one of {B x , B 2 , B 3 
and output 2}, such that the probability distribution of the 
outputs is {^Sfr, t^ft, 2 2 „ + 7 b }- Based on the values of aj, Cj 
with 1 < i < n (from binary expansions of ^ and we 
have the following rules for these connections: 

1) If di = Ci = 1, connect Ai with B\. 

2) If ai — 1, Ci = 0,connect Ai with B 2 - 

3) If ai — 0, Ci = 1, connect Ai with B 3 . 

4) If ai = Ci = 0, connect Ai with output 2. 

5) Connect A n+ i with output 2. 

Assume that the probability for a token to reach Bj with 
1 < j < 3 is P(Bj), then we have 



P(fl 1 )=g/ (o4=C4=1) 2- 



»=i 



x 





Optimal-Sized Construction 


Size-Relaxed Construction 


Latency-Oriented Construction 


Network size 


< n 


< n + 3 


< 2(n - 1) 


Expected latency 


< (2a + 1)¥L 


<*£ 


< 3.585^- 



TABLE I 

The comparison of different construction, here ^- < 2. 




Fig. 8. The framework to realize probability 



P(B 2 



P(B 3 



^ ^(ai=l,Ci=0)2 
i=l 

^ ^( ai =0 : Ci = l)2 



where = 1 if and only if is true, otherwise Lj = 0. 

As a result, the probability for a token to reach the first 
output is 



Pi = \(P(Bi) 



1 

P(S 2 )) = 



O — 2 

= 1 ) Z ~~ 2 n+1 ' 



Similarly, the probability for a token to reach the second output 
is 

P2 = 



{ 



a 

2 n+r- 

So far, we |et that the distribution of the network is 
2^ft; ^n~i 2 2"+^ b }- Similar as Theorem [2] by connecting 
the output 2 to the starting point, we get a new network 
with probability #. Note that compared to the optimal-sized 
construction, 3 more splitters are used in the size-relaxed 
construction to realize the desired probability. But it has a 
much better upper bound on the expected latency as shown in 
the following theorem. 

Theorem 5. Given a network with probability ? (2 n ~ 1 < 
b < 2") constructed using the size-relaxed construction, its 
expected latency ET is bounded by 



ET < 6— < 12. 
o 







1 


2 


7 






29 


29 





Fig. 9. The network to realize probability 



Proof: First, without the feedback, the expected latency 
for a token to reach _Bi,£?2,£?3 or output 2 is less than 2. 
This can be obtained from the example in the last section. As 
a result, without the feedback, the expected latency for a token 
to reach one of the outputs is less than 3. Finally, we can get 
the theorem. ■ 
Let's give an example of the size-relaxed construction. 
Assume the desired probability is then we can write ^ 
and fep into binary expansions: 



a 
2" 



2" 



= 0.00111, 



0.10110. 



According to the rules above, we connect A\ to B%, A2 to 
output 2,... After connecting output 2 to the starting point, we 
can get a network with probability ^, as shown in Fig. [9] 

Another advantage of the size-relaxed construction is that 
from which we can build an Universal Probability Generator 
(UPG) efficiently with dj, Cj(l < i < n) as inputs, such that its 
probability output is = |. The definition and description 
of UPG can be found in ifTTl . Instead of connecting Ai with 
1 < i < n to one of {Bi, B2, S3 and output 2} directly, we 
insert a deterministic device as shown in Fig.[l0] At each node 
of this device, if its corresponding input is 1, all the incoming 
tokens will exit the left outgoing edge. If the input is 0, all the 
incoming tokens will exit the right outgoing edge. As a result, 
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A A ii 2 

Fig. 10. The deterministic device to control flow in UPG. 




Fig. 1 1 . The network to realize probability distribution { ^ , i| , ^ } using 



Knuth and Yao's scheme. 



the connections between Ai and {_Bi,i?2, -B3, Output 2} are 
automatically controlled by inputs ai and Cj with 1 < i < n. 
Finally, we can get an Universal Probability Generator (UPG), 
whose output probability is 



ELi a* 2 " 



J2i=i( a i + c i) 2 i a + c 



a 
b' 



B. Latency-Oriented Construction 

In this subsection, we propose another construction, called 
latency-orient construction. It uses more splitters than the size- 
relaxed construction, but achieves a better upper bound on the 
expected latency. Similar to the optimal-sized construction, 
this construction is first trying to realize the distribution 
{^-, 2 2 ,T b }, and then connecting the last output to the 
starting point. The difference is that in the latency-oriented 
construction, this distribution 2 2 ~ b } is realized by 

applying Knuth and Yao's scheme [8] that was introduced in 
the section of preliminaries. 

Let's go back to the example of realizing probability ||. 
According to Knuth and Yao's scheme, we need first find the 
atoms for the binary expansions of ||, 4J, i- e - 

— -> (- - — ) 
32 ^ 4' 8' 16 J ' 

15 1111 

32 ~^ 4' 8' 16' 32 

— (— —) 

32 ^ l 16' 32 J ' 

Then we allot these atoms to a binary tree, as shown in Fig. 
[lTI In this tree, the probability for a token to reach outputs 



labeled is 44, the probability for a token to reach outputs 
labeled 1 is and the probability for a token to reach outputs 
labeled 2 is If we connect the outputs labeled 2 to the 
starting point, the desired probability ij can be achieved. 

Theorem 6. Given a network with probability | (2™ -1 < b < 
2 n ) constructed the latency- oriented construction, its network 
size is bounded by 2(n — 1) and its expected latency ET is 
bounded by 

2 n 

ET < (/ og2 3 + 2)— < 7.2. 



Proof: Let's first consider the network with distribution 
{fSTi ^ttj 2 2 i7 b }, which is constructed using Knuth and Yao's 
scheme. 

1) The network size is bounded by 2 (n — 1). To prove this, 
let's use kj to denote the number of atoms with value 2^^ , 
and use aj to denote the number of nodes with depth j in the 
tree. Then kj and aj have the following recursive relations, 



ij — kj 



a j+ i 



VI < j < n - 1. 



As a result, 



71—1 



Oj+l 



3=1 



From which, we can get the total number of atoms in the 
tree is 



N 



3=1 



2 

3 = 1 



Oi 

2 



We know that fcj and aj also satisfy the following con- 
straints, 

kj < 3, VI <j < n, 
aj mod 2 = 0, VI < j < n. 
From j = n to j = 1, by induction, we can prove that 
a? < 4, VI < j < n. 
That is because aj is even, and if Oj+i < 4, then 

£3 < I k i + ziz I <2 . 

2 - L 2 J ~ 
Since a„, ai < 2, we can get that 

n-l 

iV<^+ai + ^|<2n-l. 
3=2 

To create iV atoms, we need N — 1 = 2(n — 1) splitters. 

2) The expected latency £T' of the network with distribu- 
tion ^2, is bounded by ET' < (log 2 3 + 2). That 
is because the expected latency ET' is equal to the expected 
number of fair bits required. According to the result of Knuth 
and Yao, it is not hard to get this conclusion. 

Now we can get a new network by connecting the last output 
to the starting point. The size of the network is unchanged and 
the expected latency of the new network is ET = ET'^-. So 
we can get the results in the theorem. ■ 
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Fig. 12. The network to realize probability distribution jr, g}. 

VI. Generating Rational Distributions 

In this section, we want to generalize our results to generate 
an arbitrary rational probability distribution {qi, q 2 , Qm} 
with to > 2. Two different methods will be proposed and 
studied. The first method is based on Knuth and Yao's scheme 
and it is a direct generalization of the latency-oriented con- 
struction. The second method is based on a construction with 
a binary-tree structure. At each inner node of the binary tree, 
one probability is split into two probabilities. As a result, using 
a binary-tree structure, the probability one can be split into m 
probabilities (as a distribution) marked on all the to leaves. In 
the rest of this section, we will discuss and analyze these two 
methods. Since we consider rational probability distributions, 
we can write {qi,q 2 , ■ q m } as ?f, ^} with integers 
ai,a,2,—b and b minimized. 

A. Based on Knuth and Yao 's scheme 

In order to generate distribution { ^ , ^ , . . ., with 
2«-l < 5 < 2™ for some n, we can first construct a 
network with distribution |J, using Knuth 
and Yao's scheme. Then by connecting the last output to 
the starting point, we can obtain a network with distribution 
{iT'lf'-"'^? 2 "}- ^ n or< i er to study the properties of this 
method, we will analyze two extreme cases: (1) m = b and 
(2) to < b. 

When to = b, the target probability distribution can be 
written as For this distribution, we have the 

following theorem about the network constructed using the 
method based on Knuth and Yao's scheme. 

Theorem 7. For a distribution {}, J,,..,!}, the method based 
on Knuth and Yao 's scheme can construct a network with b + 
h(b) — 1 splitters. Here, we assume b = 2" — 2™_ 7*^* and 

h ( b ) = Ya=o ^fi- 
Proof: See the network in Fig. Q~2] as an example of the 
construction. 

First, let's consider a complete tree with depth n. The 
network size of such a tree (i.e. the number of parent nodes) 
is 2" - 1, denoted by N comp i ete . 

Let N(b) be the network size of the construction above to 
realize distribution {t,T|-,t}. Assume 

2" -b = 2 ai +2 a2 + ... + 2 aH , 



with n > ai > a 2 > ... > a# is a binary expansion of 
2" — b, then we can get the difference between the size of the 
construction and the size of the complete binary tree 

H 

A = N complete - N(b) = ^(2 a * - 1) = 2" - b - H. 

So the network size of the construction N(b) is 

N(b) = 2 n - 1 - (2" -b-H)=b + H-l, 

where H = YJi=o H = h{b). ■ 
Let N* (b) be the optimal size of a network that realizes the 
distribution {i, \ }. It is easy to see that N*(b) > b — 1. 
Note that h(b) is at most the number of bits in the binary 
expansion of 2™ — b (which is smaller than 6), so we can get 
the following inequality quickly 

b - 1 < N*(b) < N(b) < b - 1 + log 2 b. 

It shows that the construction based on Knuth and Yao's 
scheme is near-optimal when m = b. More generally, we 
believe that when to is large, this construction has a good 
performance in network size. 

For a general to, we have the following results regarding to 
the network size and expected latency. 

Theorem 8. For a distribution ?f, ^} with b < 2", 
the method based on Knuth and Yao's scheme can construct 
a network with at most m(n — \log 2 toJ + 1) splitters, such 
that its expected latency ET is bounded by 

2 n 2 n 
H i x ')-u < ET < [H(X') + 2] — , 

where ^- < 2. H(X') is the entropy of the distribution 

f ai a2_ a m 1 n — b 1 

\ 2» ' 2" ' 2™ ' 2" J " 

Proof: We can use the same argument as that in Theorem 
[6] The proof for the expected latency is straightforward. Here, 
we only briefly describe the proof for the network size. 

In the network that realizes { , p- , . . . , ^ , }, let's use 
kj to denote the number of atoms with value 2~- 7 , and use a,j 
to denote the number of nodes with depth j in the tree. It can 
be proved that the total number of atoms in the tree is 

n n 

Here, the constrains are 

kj < to + 1, VI < j < n, 
a,j is even, VI < j < n. 

Recursively, we can get that for all 1 < j < n— 1, aj < 2m. 
For the first Ll°g 2 2toJ levels, we have 

[log 2 2m J 

aj < Am. 



Hence, 

E[log 2 2m] ^ n 
3 = 1 fl J , fl l , 2^j=Llog 2 2mJ+l a i 

2 2 2 

< 2m + 1 + m(n — Ll°g2 2wJ ) 

< m(n — |1°S2 m J + 1) + 1- 

So we can conclude that m(n — |1°§2 mJ + 1) splitters 
are enough for realizing ^g-, 2 ~ ° } as well as 

r oi 1 " _ 

I 6 ' b ' "•' 6 J ' 

This theorem is a simple generalization of the results in 
Theorem [6] Here, the upper bound for the network size is 
tight only for small m. 

B. Based on binary-tree structure 

In this subsection, we propose another method to generate 
an arbitrary rational distribution {^-, ^3-, ^P-}. The idea of 
this method is based on binary-tree structure. We can describe 
the method in the following way: We construct a binary tree 
with m leaves, where the weight of the ith (1 < i < m) leaf 
is q i = 2±, For each parent (inner) node, its weight is sum of 
the weights of its two children. Recursively, we can get all the 
weights of the inner nodes in the tree and the weight of the root 
node is 1. For each parent node, assume the weights of its two 
children are w\ and u> 2 , then we can replace this parent node 
by a subnetwork which implements a splitter with probability 
distribution { — ^ — , — ^ — }■ For each leaf, we treat it as an 

L W1+W2 ' W\-\-W2 1 

output. In this new network, a token will reach the i th output 
with probability 

For example, in order to realize the distribution 
we can first generate a binary- tree with 4 
leaves, as shown in Fig. |13(a)| Then according to the method 
above, we can obtain the weight of each node in this binary 
tree, see Fig. |13(b)| Based on these weights, we replace the 
three parent nodes with three subnetworks, whose probability 
distributions are {^i^lil^jlljlfig;}- Eventually, we 
construct a network with the desired distribution as shown in 
Fig. 1 1 3(c)| It can be implemented with 1 + 2 + 2 = 5 splitters. 

In the procedure above, any binary-tree with m leaves 
works. Among all these binary-trees, we need to find one 
such that the resulting network satisfies our requirements in 
network size and expected latency. For example, given the 
target distribution {| , 4, |, the binary tree depicted above 
does not result in an optimal-sized construction. When m is 
extremely small, such as 3, 4, we can search all the binary-trees 
with to leaves. However, when to is a little larger, such as 10, 
the number of such binary-trees grows exponentially. In this 
case, the method of brute-force search becomes impractical. In 
the rest of this section, we will show that Huffman procedure 
can create a binary-tree with good performances in network 
size and expected latency for most of the cases. 

Huffman procedure can be described as follows 0: 

1) Draw to nodes with weights (ft, q 2) q m . 

2) Let S denote the set of nodes without parents. Assume 
node A and node B are the two nodes with the minimal 
weights in S, then we added a new node as the parent 
of A and B, with weight w(A) + w(B), where w(X) 
is the weight of node X. 



n 




< 5 x> 

(c) 

Fig. 13. (a) A binary-tree with 4 leaves, (b) Node weights in the binary 
tree, (c) The network to realize probability distribution yg}, where 

{ i , |},{|, j} can be realized using the methods in the sections above. 




Fig. 14. The tree constructed using Huffman procedure when the desired 
distribution is {0.1, 0.1, 0.15, 0.15, 0.2, 0.3} 

3) Repeat 2) until the size of S is 1. 

Fig. [14] shows an example of a binary-tree constructed 
by Huffman procedure, when the desired distribution is 
{0.1,0.1,0.15,0.15,0.2,0.3}. From 0, we know that using 
Huffman procedure, we can create a tree with minimal ex- 
pected path length. Let EL* denote this minimal expected 
path length, then its satisfies the following inequality, 

H(X) < EL* < H(X) + 1, 

where H(X) is the entropy of the desired probability distri- 
bution { qi ,q 2 , q m } = {f, f, 2f }. 

Let Wi denote the weight of the i th parent node in the 
binary tree. In order to simplify our analysis, we assume 
that this parent node can be replaced by a subnetwork with 
about \og 2 (bwi) splitters. This simplification is reasonable 
from the statistical perspective and according to the results 
about our constructions for realizing rational probabilities in 
the sections above. Then the size of the resulting network 
is approximately log 2 (&Wj). According to Lemma [2] 

in the Appendix, when to is small, Huffman procedure can 
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Based on Knuth and Yao's Scheme 


Based on binary-tree structure 


Network size 


< m(n — [log 2 m \ + 1) 


< (m — l)n 


Expected latency 


< (log 2 (m + l) + 2)f- 


< (log 2 m + l)ET max 



TABLE II 

The comparison of different methods, here ^- < 2 



create a binary-tree that minimizes Yh^i 1°§2 w i- As a resu lt, 
among all the binary-trees with m leaves, the one constructed 
based on Huffman procedure has an optimal network size 
- however, it is only true based on our assumption. For 
example, let's consider a desired distribution {qi, q%, q m } 
with X^ieS 1i — \ f° r some set S. In this case, the binary- 
tree structure based on Huffman procedure may not be the 
best one. 

Now we can get the following conclusion about stochastic 
flow networks constructed using the method based on binary- 
tree structures. 

Theorem 9. For a distribution {^-, ^, ^f-} with b < 2™, 
the method based on binary-tree structures constructs a net- 
work with at most (to — l)n splitters. If the binary tree 
is constructed using Huffman procedure, then the expected 
latency of the resulting network, namely ET, is upper bounded 
by 

ET < (H{X) + l)£T max , 

where H(X) is the entropy of the target distribution and 
ET max is the maximum expected latency of the inner nodes 
in the binary-tree. 

Proof: 1) According to the optimal-sized construction, 
each inner node can be implemented using at most n splitters. 

2) The upper bound on the expected latency is immediate 
following the result that the expected path length EL* < 

H(x) + i. m 



C. Comparison 

Let's have a brief comparison between the method based 
on Knuth and Yao's scheme and the method based on binary- 
tree structure. Generally, when m is large, the method based 
Knuth and Yao's scheme may perform better. When m is 
small, the comparison between these two methods is given 
in Table HIl where the desired distribution is {^-, ?f, ^} 
with 2" _1 < b < 2™. In this table, we assume that the binary 
tree (in the second method) is constructed using Huffman pro- 
cedure. ET max denotes the maximum expected latency of the 
parent nodes in a given binary-tree. It is still hard to say that 
one of the two methods has an absolutely better performance 
than the other one, no matter in network size or expected 
latency. In fact, the performance of a construction is usually 
related with the number structure of the target distribution. In 
practice, we can compare both of the constructions based on 
real values and choose the better one. 



VII. Concluding Remarks 

Motivated by computing based on chemical reaction net- 
works, we introduced the concept of stochastic flow networks 
and studied the synthesis of optimal-sized networks for realiz- 
ing rational probabilities. We also studied the expected latency 
of stochastic flow networks, namely, the expected number of 
splitters a token need to pass before reaching the output. 
Two constructions with well-bounded expected latency are 
proposed. Finally, we generalize our constructions to realize 
arbitrary rational probability distributions. Beside of network 
size and expected latency, robustness is also an important issue 
in stochastic flow networks. Assume the probability error of 
each splitter is bounded by a constant e, the robustness of a 
given network can be measured by the total probability error. 
It can be shown that most constructions in this paper are robust 
against small errors in the splitters. 

To end this paper, we compare a few types of stochastic 
systems of the same size n in Table [TTTJ Here we assume 
that the basic probabilistic elements in these systems have 
probability 1/2 and we want use them to synthesize the 
other probabilities. To unfairly compare different systems, 
we remove threshold logic circuits from the list, since their 
complexity is difficult to analyze. From this table, we see that 
stochastic flow networks have excellent performances in both 
expressibility and operating time. Future works include the 
synthesis of stochastic flow network to 'approximate' desired 
probabilities or distributions, and the study of the scenario that 
the probability of each splitter is not -i. 
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Appendix 

Lemma 1. Given Q an n x n matrix with each entry in 
{0,4,1}, such that sum of each row is at most 1, then we 
have < det{I — Q) < 1, where I is an identity matrix and 
det(-) is the determinant of a matrix. 

Proof: Before proving this lemma, we can see that for 
any given matrix Q, it has the following properties: For any 
i,j such that 1 < i < j < n, switching the i th row with the 
j th row then switching the i th column with the j th column, 
the determinant of K = I — Q keeps unchanged. And more, 
each entry of Q is still from {0, i, 1} and sum of each row of 
Q is at most 1. Now, we call the transform above as equivalent 
transform of Q. 

Let's prove this lemma by induction. When n = 1, we have 
that 

Q=(0)or<3=(|)or<2=(l). 



In all of the cases, we have < det(I — Q) < 1. 

Assume the result of the lemma hold for (n — 1) x (n — 
1) matrix, we want to prove that this result also holds for 
n x n matrix. Now, given a n x n matrix Q, according to 
the definition in the lemma, we know that the sum of all the 
entries in Q is at most n. As a result, there exists a column 
such that the sum of the entries in the column is at most 1. 
Using equivalent transform, we have that 



• The sum of the entries in the I s * column of Q is at most 
1. 

> The sum of the entries in each row of Q is at most 1. 

Now, for the 1 st column of I — Q, let's continue using the 
equivalent transform to move all the non-zero entries to the 
beginning of this column. The possible non-zero entry set of 
the I s * column of / — Q is 



1 



1 



},{!,--},{!, -!},{!, - 



1 



}• 



The first three cases, the result in the lemma can be easily 
proved. In the following proof, we only consider the other 
cases (let C\ denote the non-zero entry set for the I s * column 
of / - Q) : 

(l)d = {i,-i}. 

In this case, we can write Q as 




where A has at most one non-zero entry — |, the same as B. 
Let 



E x = ( 1 



/ 1 
1 



V 





°\ 





1 / 



then we have 



det(I - Q) 



det 



det 



-A 

h-c 

E 1 - A-B 
h-C 



det 



Ex-B 

h-c 



C 



Let D = A 

l 



B, since both A and B has at most one non- 



zero entry ^, we know that each entry of D is from {0, \, 1}, 
and the sum of all the entries is at most one. According to our 
assumption, we know that 



< det(I - 



D 

C 



< 1. 



14 



As a result, we have 



< det(J -Q)< 

(2)Ci={l,-f}. 

In this case, we can write Q as 



Then 



Q = 



1 




det(I - Q) 
1 
2 
1 
2 
1 
2 



det 
•det 
■ det (J 

According to our assumption 

< det (J 



+ det 



-A 

h-c 

2E 1 -A- 2B 

h-c 



E x -B 

h-c 



2B 
C 



A 
C 

2B 



)<1. 



0<det(7-( c ))<!, 



so det(I — Q) is also bounded by and 1. 

(3) Ci={l,-l}. 

Using the same argument as case (1), we can get the result 
in the lemma. 

(4) a = {i,-i-±y. 

In this case, we can write Q as 



Q = 



( A \ 

I C 
\0 D J 



Let 



Eo 



1 



o) 



/ 1 ... \ 
1 ... 



\ ... 1 / 



Then 



I-Q 



— | Ei — B 
~2 E 2 — C 
V O h-D J 



det(I - Q) 

I Ei — B 
det E 2 - C 

V h-D 

1 




-det | E 2 -C 



-A 



--det Ei — B 
[ h-D 




1 



Ei — B 



+ - det E 2 -C -A 
1 ; ' h-D 



Now, we can write A = E + F such that both E and F has 
at most one non-zero entry, which is 1. Therefore, 




where 





Ei -B- 




E 2 - 


det y 


h- 




Ei-B- 


det ^ 


E 2 — C — 




h-D 




(Ex-B 


+ dct 






*- 



and 



D 

E 
F 

— j 

D 



Ei — B 
det | E 2 -C -E-F 
h-D 



-F 

+ det | E 2 - C 
h-D 



Ei- 


B - 


F N 






E 2 — 


C- 


E 


) +dot U' 




h 


- D 









+ det I E 2 -C -E 
h-D 



Finally, we can get that 

det(I - Q) 
1 



B + E 

■del' I - | C + F -<lot[/ 




According to our assumption, we have that 
< del / | ( ' 4- F | 1 ; 1. 
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< det[J- I C + E I] < 1. 

Therefore, the result of this lemma holds. 

This completes the proof. ■ 

Lemma 2. Given a desired probability distribution 
{q%, q 2 , q m } and m < 6, Huffman procedure can 
construct a binary-tree such that 

1) It has m leaves with weight q\, q 2l q m - 



2) L 



Em — 1 i 
,-=i 10 



g 2 is minimized, where Wj is the 



weight of j parent node in a binary tree with m leaves. 

Proof: It is easy to prove that the case for m = 3 or 
m = 4 is true. In the following proof, we only show the case 
for in — 5 briefly. W.l.o.g, we assume q\ < q 2 < ... < q$. 
Without considering the order of the leaves, we have only two 
binary-tree structures, as shown in Fig. Q3] 



parent node using a leaf with weight q\ + q 2 . Then we can 
get an optimal tree for distribution {qi + q 2 , 93, <?4, 95}, whose 
L value is L\. Assume the optimal L value for distribution 

{91,92, 93, 94, 9s} is L%, then 



LI 



L* 4 + log 2 (qi +q 2 ). 



Let's consider a tree constructed by Huffman procedure for 
{<7i, 92, 93, 94, 95}, whose L value is L5. We want to show 
that this tree is optimal. According to the procedure, we know 
that qi and q 2 are also siblings. By combing q\ and q 2 to a 
leaf with q\ + q 2 , we can get a new tree. This new tree can 
be constructed by applying Huffman procedure to distribution 
{91 + 92,93,94,95}- Due to our assumption for m = 4, it is 
optimal, as a result the following result is true, 

L 5 = L\ + log 2 (gi + q 2 ). 

Finally, we can obtain L5 = Lg, which shows that the 
L value of the tree constructed by Huffman procedure is 
minimized when m = 5. 

This completes the proof. ■ 




(a) 



(b) 



Fig. 15. Two possible tree structures for m = 5. 



In both of the structures, for any pair of leaves Xi and Xj, if 
Xi's sibling is Xj's ancestor then x% > Xj. Otherwise, we can 
switch the position of Xi and Xj to reduce Y^j=i l°g2 w j- So 
if the tree structure (a) in Fig. [15] is the optimal one, we have 
X\ = q\.x 2 — q 2 or X\ — q 2 ,x 2 = q±. Now, we will show 
that if the tree structure (b) in Fig. [15] is the optimal one, we 
also have x\ = q\, x 2 = 92 or x\ — q 2l x 2 — q\. 

For the tree structure (b), we have the following relations: 

£3 > max{xi, £2}, 

Xi + £5 > max{xi + x 2 , X3}. 

Then q\ and q 2 is in {a;i, x 2 , X4, X5} and x\ + x 2 < 1 ~ 2 X3 . 
Let x — xi + x 2 , then L can be written as 

L = minlog(xi + x 2 ) + log(a;i + x 2 + x 3 ) + log(a:4 + x 5 ) 
= minlog((a;i + x 2 )(xi + x 2 + x 3 )(l - x\ — x 2 — x 3 )) 
— minlogx(l — x 3 — x)(x + X3). 

So we can minimize x(l — X3 — x)(x + X3) instead of 
minimizing L. Fixing X3, we can see that x(l — X3 — x) 
increases as x increases when x < X ~ 2 X ^ ; (x + 2:3) also 
increases as x increases. So fixing X3, x(l — X3 — x)(x + X3) 
is minimized if and only if x is minimized, which will cause 
x\ =qi,x 2 = q 2 or x\ = q 2l x 2 = q x . 

Based on the discussion above, we know that in the optimal 
tree, q\ and q 2 must be siblings. Let's replace q\, q 2 and their 



